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Abstract 

We introduce a novel time-homogeneous Markov embedding of a class of time in¬ 
homogeneous Markov chains widely used in the context of Monte Carlo sampling algo¬ 
rithms which allows us to answer one of the most basic, yet hard, question about the 
practical implementation of these techniques. We also show that this embedding sheds 
some light on the recent result of [7]. We discuss further applications of the technique. 


^^eywords: Markov chain Monte Carlo, Metropolis within Gibbs, Peskun order, determin- 
^^stic scan, random scan. 


^ Introduction 


It is often said that little time is needed for a novice Markov chain Monte Carlo user to 
embarrass an expert with apparently simple questions. One such question is the following, 
l^et TT be a probability distribution defined on some measurable space (X, df] and for some 
€ N* let fp := {Hi : X x V ^ [0,1],i = 1,..., fc} be a family of Markov transition kernels 
. 5^ssumed to be reversible with respect to tt. Markov chain Monte Carlo methods consist 
S^f using these Markov transitions in order to simulate realizations of a Markovian process 
> 0} which may be used to approximate expectations of functions / : X —^ R with 
Respect to TT, (/(V)) for X ^ tt , with estimators of the form 


M-l 

= M E ■ 

i=0 

A natural question is how to best use *P in order to minimize the variability of this estimator? 
Traditionally there are essentially two approaches to construct such Markov chains from ip. 
The first one consists of considering the homogeneous Markov chain with transition defined 
as a mixture of the transitions in ip 

which corresponds to choosing one of the kernels at random at each iteration. The sec¬ 
ond option consists of cycling through ip in a deterministic fashion, which defines a non- 
homogeneous Markov chain. More precisely, for fc € N* define the forward circular permuta¬ 
tion cr : {1,..., fc} ^ {1,..., fc} such that cr(j) = j -1- 1 for j € {1,..., fc — 1} and a{k) = 1, 
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and its powers, starting with a^(j) = j for j G {1,k} and cr* = a^~^oa for i > 1. Then we 
introduce the sequence of Markov transition probabilities = no-i-i(i),z > l}- 

For simplicity in the two scenarios outlined above we use the same notation P;i(-) (resp. 
E^(-)) for the probability distribution (resp. expectation) of the two Markov chains such 
that Xq ^ fi where p is a probability distribution on (X, ff). For example, for i > 1, with 
Pj := a(Xo,Xi,... ,Xi) and A G X 


G A|P_i) 




depending on the scenario considered. There shall not be room for confusion in what follows. 
Similarly, letting var^(-) be the variance operator corresponding to E^(-), we define for 
P* = |p''a-nd| Qj. p* _ pstrat / : X —>• M, the asymptotic variance 


var(/,P*)= lim var^ , 

M-^-oo V / 


when the limit exists. The novice’s question we are interested in here is which of the 
two schemes one should use in order to minimize the asymptotic variance of the estimator 
<S'm(/)? Despite a long interest in ordering Markov chain Monte Carlo methods in terms 
of such performance measure [5, 8, 2, 10, 7], the novice’s question is, to the best of our 
knowledge, still unanswered and surprisingly hard; we provide here only a partial answer 
as we show that for /c = 2 it is always preferable to use the deterministic update; see 
Theorem 6 for a precise statement. We note that if the measure of performance is the time 
for convergence to equilibrium then no general conclusions can be made, see [9] and the 
references therein. Indeed, in tractable scenarios involving the so-called Gibbs sampler, it 
can be established that neither of the schemes dominates the other uniformly and that the 
dependence structure of the targeted distribution tt determines this ordering. 

The main idea of our proof consists of embedding the inhomogeneous Markov chain 
induced by the sequence into an homogeneous Markov chain, of transition T defined 

on an extended space, and rewriting the asymptotic variance of the inhomogeneous chain 
in terms of the resolvent of the operator T corresponding to this embedding chain (Eq. 
(3)). This leads to a generalization of a well known and simple identity for the asymptotic 
variance of homogeneous Markov chains. It turns out that in the case k = 2 the homogeneous 
Markov chain defined by can be seen as being the self-adjoint part of the operator T. 

This together with another variational representation of the asymptotic variance of Markov 
chains in terms of their self-adjoint and skew symmetric parts allows us to answer the novice’s 
question when k = 2 (Theorem 6). Along the way we also show that our approach provides 
us with a very short proof of the very recent and important result of [7] which allows one 
to compare performance of certain inhomogeneous Markov chains in the case k = 2. Our 
proof sheds some light on the developments in [7] and illustrates the difficulty encountered 
when trying to establish the result for fc > 3 (Theorem 7). 


2 Homogeneous embedding 

The proofs of our results are based on classical Hilbert space techniques. We recall here 
related definitions which will be useful throughout. For any probability measure p on (E, f) 
a family of Markov kernels {Ui : E x £ —?► [0,1], i = 1,..., fc} and any function / : E — R 
let, whenever the integrals are well-defined, 

Kf) = j /( 2 ;)M(dx) and nqf{x) = j f{y)nq{x,dy) , 
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and for A; > 2, by induction, 


7T^o;fc(g) ^(x) J UfjO-.k-i^ dy'^Ilfjkf (jj^ . 

Consider next the spaces of square integrable (and centred) functions defined respectively 
as 

L^{E,^l) = {f■.E^R■.^l{f)<^} and LI{E, fi) = {f e L^{E, ^l) : fi{f) = O} , 

endowed with the inner product defined for any f-,g€ L'^{E, p) as (/, g)^ = f f(x)g(x)fj,(dx), 
and the associated norm ||/||^ = ^(/, For A € (0,1) and / € i^(E, p) we introduce 
the quantity 

k oc 

var4/,P^‘-*) = ||/-p(/)||2 + -^^Ay/-/i(/),7J.o.-q,)/-/i(/))^ , (2) 

q—1 s—1 

which, with an abuse of language, we may refer to as the asymptotic variance. This quantity 
is well defined as for / € L^(E,/i) |(/, 7T£,.o:s-i(p)/)^| < ||/||p < oo for s € N while the 
limit as A t 1 may or may not exist. We first establish an expression for the asymptotic 
variance of under minimal conditions, which can be informally thought of as 

limAti var^ (/, A similar expression was obtained in [4] for the Gibbs sampler and 

for fc = 2 in [7] under a slightly stronger assumption. 

Proposition 1. Let f G Lq(X, tt) and assume that for q G {1, ... ,k} 

OO 

y](/, 

s^l 

exists. Then for the inhomogeneous chain defined by 

f-. Ai OO 

Jim^var,(Mi/2,SM(/)) = ll/IlP5] I] . 

q—1 s—1 

The proof can be found in the appendix. We now embed the inhomogeneous Markov 
chain of the previous section into an homogeneous Markov chain, which facilitates later 
analysis. This allows us to find a simple expression for vaix(^f, (Lemma 2 and Corol¬ 

lary 3) and our result is then a direct consequence of the standard result in Lemma 4. Let 
T : X —>■ [0,1] be the Markov transition probability such that 

k 

x dp x ... x = pn,(x(*),dy('"W)) 

and 7r®^(dx^^^ x dx^^^ x • • • x = JliLi 7J‘(dx^*^). We then define the associated time- 

homogeneous Markov chain { ..., ^ 0 } such that ..., ^ 

7 ]-®^. We note that the sequencelXj^"^ > O} coincides with the non-homogeneous chain 
defined through in the introduction (a proof is provided in Lemma 11), and the other 

embedded chains correspond to the same cycle, but started at different points of the cycle. 
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As is customary we will use the same notation for the operator on functions associated 
to the transition kernel T, that is letting tt ) = , tt) ) we define 

T : L^’’^{X,Tr) L^’’^{X,Tr) 

such that for any (p = (^ipi, (p 2 , ■ ■ ■, (fik) € tt) 

(ffl ; ^2^(7(2) ; • ■ • ; ; ■ ■ ■ ; ^k^CT^k) ) 

We will let T* denote the adjoint of T and endow the vector space L^’^{X, tt) with the inner 
product defined for any S tt) as 

k 

i=l 

The following result allows us to write (2) in terms of the resolvent of T. We use the standard 
convention that = 7, the identity operator, the additional convention that for j S N, 
flo-Oi-iQ) = I and define for A S (0,1) and p £ T^’^(X,7r), (/ — XT) 

Lemma 2. Let f £ L^(X,7r) and define f £ {/}^ £ (X,tt) . Then for any A £ (0,1) we 
have 

oo k k oo 

if, (/ - Ar)-V) = A*(/,n.o.-.(,)/). 

<7—1 <7—1 i—0 

Proof. Let p £ (X, tt) . We first establish that for any f > 1 we have for all j £ {1,..., fc} 

[T T\j = ■ ■ ■ ncr*-l(j)7’o-»(j) ■ 

This is clearly true for i = 1. Assume this is true for f > 1 then 

[t ip\. = \t °Tlp\. = njno-(j)• • ■no-i-i(j )= , 

from which we conclude. This implies that for f > 1 

k 

9=1 

Now with our conventions 

OO k 

if, {I - AT)-V) = E E(/’ , 

i=0 9=1 

where we note that the sum is absolutely convergent. □ 

Corollary 3. For any f £ Tq(X, tt) one can rewrite the asymptotic variance as follows 

var,(/,T^‘-^) = ^(/, (/-AT)-'/) - \\f\\l , (3) 

which generalizes the expression for the asymptotic variance in the homogeneous case (k = 1) 
in terms of the resolvent. 


4 


We have the following general result (which can be traced back at least to [6, proof of 
Lemma 3.1]) which leads to a powerful variational representation of the asymptotic variance 
associated to general Markov transition probabilities. A proof is provided in the supplemen¬ 
tary material for completeness. 

Lemma 4. Let (E,f) be a measurable space on which we define a probability distribution 
fi and a Markov transition probability 77 : E x ^ [0,1], not necessarily reversible, leaving 
invariant. Then, with S = -\- 77*)/2 and A = (77 — 77*)/2 (the self-adjoint and skew 

symmetric parts of 77 ), for any A S (0,1) and f S L^(E, ytfj 

{f,{l-\n)-^f) = sup 2{f,g) -{g,{l-\S)g) -X^{Ag,{l-XS)-^Ag) . 

ggL2(E,/i) 

Corollary 5. As a consequence 

if, (7 - A77)-V), < if, (I - XSy^f), - ^^Ag, (7 - A5)-'Ag)^ < (/, (7 - A^)”'/)^ 

where g= (7 - A77*) (7 - AS") (7 - A77)“ V- 

Now we consider a direct application of this result which leads to our main result, The¬ 
orem 6. 

Theorem 6. Let 7 = 2. For any f € Lq(X,tt) and A € [0,1) 

varA(/,P’'“‘^) >varA(/,P^*^"‘) . 

If in addition f € Lq (X, tt) satisfies 

k oo 

^^|(/,n,0: = -l(,)/)^| <oo , (4) 

q—1 s—1 

then 

var(/, P‘'“‘^) > . 

Proof. We let S = (T T*) /2 and A = (T — T*) /2 be the self-adjoint and skew symmetric 
parts of T. Notice that 


Tip= (nily92,n2iy9l) and T*ip= {li 2 (p 2 ,Ai(pi) 
where, if needed, the second statement can be established using Lemma 10. Therefore 

1 + n2) \ 

^’ 

which corresponds to two homogeneous chains run in parallel, each with transition proba¬ 
bility We now apply Corollary 5 to T and obtain 

(/, (7 - AT)-'/) - WfWl < (/, {I - XS)-y~) - WfWl 

and remark that for any g € (X, tt) 

{f,{l-XS)-y~)= sup 2(/,g)-(g,(7-A5)g)=2(/,(7-AP-'“'')-V), 

(X,7r) 


(P -E P*) 


P = 


(ni + n2) _ (n 

--P2, — 
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where we have used that 2{f,g) - {g, (/ - XS)g) = - {9i, {l - 

and Lemma 12 twice to establish the two equalities. Therefore 

(/■ (/ - AT)-'/) - wfwi < 2(/, (/ - - WfWl 

and the first statement follows with Corollary 3. For the second statement, since is self- 

adjoint, limAti varA(/, exists and converges to var(/, (if finite) and the addi¬ 

tional summability condition allows us to conclude in a similar fashion that limA-i-i va,T\{f, p®*''®-*) 
var{/,P"‘"^*). □ 

For k = 2, cycling deterministically through ip is therefore always better in terms of 
asymptotic variance than random scanning. A related result was previously known for the 
Gibbs sampler [4] that is in the particular scenario where both IIi and 112 are projections, 
a property essential in order to establish that 

var(/,P-"'') = 2var(/,P^‘-*) - var^/) . 

In the present scenario from Corollary 5 one can obtain a lower bound on the gap in the 
inequality, (Ag, (7 — XS) Ag') where 

g={l-XT*)-\l-XS){l-XT)-^f. 


3 A short proof of the ordering result of [7] 

In [7] the authors have established that for A: = 2 the algorithm satisfies a Peskun type result 
[8, 10]. 

Theorem 7 (Maire, Douc and Olsson). Let k = 2 and consider two pairs of tt— reversible 
Markov transition probabilities *P = {111,112} and = {lli,ll 2 }. If for any g € P^(E,7r) 
and i € {1,2} {g, {I-1li)g)^ < {g, {I-Il,)g)^ then for any f € L'^{X,Tr), varA(/, P’"*’’’"*) < 
varA (/, . If in addition (f) holds forand Vji' then var(^f, < var(/, P®*''®'*) . 

The proof of this fact is a direct consequence of Lemma 8 below. In order to state this 
key result it is useful to rewrite the operator T as the composition of elementary operators, 

g±i^ A: p2.fe(-x^7r) L'^’'^{X,Tr) 

where are the forward and backward circular permutation operators such that for any 

= (T’o-iqi)) 'y5cr±i(2)) ■ • ■) T’o-iqqi ■ • ■ 17^o-±i(fc)) 
and A is the diagonal operator such that for any (p G P^’^(X, tt) 

A(p = ..., iik-i‘Pk-i,Ak(pk) ■ 

Then T = A o & (see Lemma 10). 

Lemma 8. Let T and T be the embedding operators as defined earlier and associated with 
ip and ip given in Theorem 1, define T{l3) = (IT -I- (1 — fI)T and A (,5) = (IA -I- (1 — /3)A for 

d G [0, Ij. For A G [0, 1) and f G p2(X,^) let5xj{d) = (/, {l - XT {(!))-^ f) , with f G {f}K 
Then 

= A( (/ - ©-'A(/3)) -V, (A - A) (7 - 6A(/?)) -V) . 
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Proof. We note that T(/3) = A(/3)o© and have 

(I - ■' [T -f){l- XTiP)) - V) 

= A(/, {I - AT(/3))“'(A - A)6(/ - Ar(/3))"'6-V) 

= \{{I- \e-^T{p)e-^y^f, (a - A)6(/ - \T{f3))-^&-y) 

= A((/ - A6-iT(^)6-i)-V, (A - A) (/ - X6T{P)G-y"f) . 

The first equality follows from standard arguments (see [10] and [1, Lemma 51] for additional 
details, noting that reversibility is in fact not required). On the second line we have used 
the definition of T and T in terms of A, A and ©, and 6^^/ = /. On the third line we 
have used that for fc > 1 the adjoint of r(/3)^ is T*(/3)^ = (©“^r(/3)©“^)^ from Lemma 
10, from which we deduce that [(/ — AT(/3)) ^]* = (-^ ~ A©“^T(,5)©“^) ^. On the fourth 
line we use the identity 

C OO \ oo 

6^^ = X! = (/-A©T(/3)©-i)”^ 

fc =0 / fc =0 

and the result follows with T(/3) = A (,5)©. □ 

Proof of Theorem 7. The derivative of S\j{P) is evidently positive if the resolvent type 
terms (/—©A(/3)) and (/—©”^A(/3)) coincide since for any ()) € L^’^(X, tt) (</), (A — 

A)((i) > 0. This is the case for /c = 2 since in this scenario ©~^ = © which reduces to a 
“swap” operator. □ 

Remark 9. We may wonder whether the result of Theorem 7 holds true for k > 3. To 
that purpose assume that the Markov transitions in and ip coincide, except for the i—th 
element and notice that with nj (/3) = [A(/3)]^. 


(/-©A(/3)) V' 


OO 


^A^n.i.(q(/3)/ 

1=0 


and 


(/ - A©-iA(/ 3 )) VI . = E A^n,---Md(V/ ■ 


1=0 


Then the derivative of ^a./(/3) reduces to 


d 




1=0 


1=0 


and this term is positive as soon as the two sums coincide, which is the case ii k = 2p — 1 
for some p € N*, i = p and (111,112 ,... ,11^) = (Qp, Qp-i, ■■ ■, Q 2 , Qi,Q 2 , • ■ •, Qp) for a 
family of transition probabilities ■. X x X —1 [0,1],* = 1,... ,p}. This corresponds to 
a known reverbilisation strategy of the Gibbs sampler, albeit for the operator Q 1 Q 2 ■ ■ ■ Qp- 
We note that this also holds in the case where k = 2p — 2 for some p € N*, the cycle 
(Hi, 112, ■ • ■ ,nfe) = (Qp-i,..., Q2, Qi,Q 2 , ■ ■ •, Qp) and both i = p - 1 and i = 2p—2. 
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4 Conclusions and perspectives 

We have introduced a novel time-homogeneous Markov embedding of a class of time inho¬ 
mogeneous Markov chains widely used in the context of Monte Carlo sampling techniques. 
We have shown that this approach allows one to rapidly prove new, or recently established, 
results by leveraging existing techniques or known results for homogeneous Markov chains. 
We suggest two possible directions for further research. In [11] the author extended the 
celebrated Kipnis and Varadhan results on the central limit theorem for reversible Markov 
chains to the general scenario; our approach offers the promise to be able to extend those 
results to the inhomogeneous Markov chains considered in the present paper. Another possi¬ 
ble avenue of research is concerned with hnding bounds on the convergence to stationarity in 
terms of e.g. total variation distance. For example one could attempt to extend the results 
of Fill [3, Theorem 2.1] which rely on the multiplicative reverbilisation of non-reversible 
Markov chains and the assumption that X is a finite discrete space. 


A Appendix 


Proof of Proposition 1. Let / S Lq{X, tt) then for n > 1 

E. [Eto/(A)] = [f{Xo)] + 2 ^ [f{X,)f{X,)] 

0<i<j<n—l 

and we focus on the second term. We rewrite it as 

^ e4/(w)/(a,)] = ^ E4/(w)n,..-i(i)/(w)] 

0<i<j<n—l 0<i<j<n—l 

0<i<j<n —1 

0<i<j<n — l 

n — l — i 

0<'i<n —1 m—1 

Now we have (the term * = 0 is treated similarly, but separately) 

n—l—i 

E E 

[{n-2-q)/k\ 1 


0<2<n —1 m—l 
k 

= E' 

9=1 


n—l—pk—q 

n Un-2-q)M ^ (/, M?)/). > 

-I Q<pk-\-q<n—l m—l 


and we conclude by letting n —>■ oo and using a Cesar o sum argument for each 

□ 

Operator T is not self-adjoint, but one can easily determine the expression for its adjoint 
T* in terms of © and A or T (visualising T as & block diagonal matrix may be helpful). 


Lemma 10. We have that 



1. the adjoint of G is G* = G“^, 

2. A* = A, that is A is self-adjoint, 

3. T = A o G and the adjoint of T is T* = G“^ o A = G“^ oT o . 

Proof. The first result follows from 

k k 

i=i j=i 

The second result is direct and the third result follows from the general fact that T* = G*oA* 
followed by an application of the first two results of the lemma. We conclude. □ 


B Supplementary material 

We let F’(-) denote the probability distribution of the Markov chain defined by T. 

Lemma 11. For m > 0 and Ai, A 2 ,..., Am G T™ we have 

P ^Aq ^ ^ & Aq, ^ ^ G Ai, . ■ ■ J ^m ^ Am'j = P (Xq G Aq, Xi G Aim ■ ■ , € Am) ■ 

Proof. By construction for A G X, ^ ^Xq S = 7r(^) = P (Aq G A) and for i > 1 
component cr®(l) is generated by kernel and with 

g. = a ((X^\X^\ ..., AW), 0 < m < z) 

we have 

p (Af‘(') e A I a.-i) = (xCi'^^\a) 

and we conclude. □ 


Proof of Lemma f. We have 

(/, [/ - ATT] -V), = ((/ - ATT) (T - ATT)-^, (/ - ATT)-^), 

= ((/_A5)(t-att)-V,(/-att)-V), 

= sup 2 ((T-ATT)-V,/i),-(/^,(/-A5)-'/z)^ 

= sup 2{f,{i-xn*)-\}^-{h,{i-xs)-\)^ 

h£L^{E.ji) 

= sup 2{f,g) -{{I-xn*)g,{l-xs)-\l-xn*)g) 

g€L^{E.,^) 

= sup 2{f,g) -{g, {I-XS)g) -{XAg, {I-XS)~"XAg) 

geL2(E,/i) 

where we have used that 11 = S + A, TT* = S — A, that for any g G L^(E,/i), l^g^Ag')^ = 
— (^Ag,g')^ = 0, Lemma 12 for the self-adjoint operator (T — AS), [(T — ATT) ^]* = (T — 
ATT*) set g = (T — ATT*) and again used the property (g, Ag'j^ = 0. From Lemma 12 
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the supremum on the third line is attained for h = [l — AS') (/ — ATT) ^ f, which translates 

into g = (^I — Xn*) on the last line. Consequently using again Lemma 12 for the operator 
T — AS we deduce 

(/, [/ - ATT] -V), < (/, [/ - AS] -V), - (T - XSy^Ag)^ 


□ 

The following provides a useful variational representation of the quadratic form of the 
inverse of a positive self-adjoint operators, attributed to Bellman, and used for example by 

[ 2 ]. 

Lemma 12. Let A be a self-adjoint operator on a Hilbert space %, satisfying (/, Af) > 0 
for all f € H. and such that the inverse A~^ exists. Then 

(f, A~^f) = sup [2 {/, g) - {g, Ag)] , 
gen 

where the supremum is attained with g = A~^ f. 
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